X 



rameters of a graphite nitridation experiment is studied in the context of 



Bayesian experimental design for the active nitridation of 
graphite by atomic nitrogen 

Gabriel Terejanu, Rochan R. Upadhyay, Kenji Miki 

I Center for Predictive Engineering and Computational Sciences (PECOS), 

I Institute for Computational Engineering and Sciences (ICES), 

f— ^ The University of Texas at Austin 

»— » 

Abstract 

The problem of optimal data collection to efficiently learn the model pa- 
ck 

Bayesian analysis using both synthetic and real experimental data. The pa- 
per emphasizes that the optimal design can be obtained as a result of an in- 
£>~, formation theoretic sensitivity analysis. Thus, the preferred design is where 

Q^ the statistical dependence between the model parameters and observables 

is the highest possible. In this paper, the statistical dependence between 
random variables is quantified by mutual information and estimated using 
^ a k nearest neighbor based approximation. It is shown, that by monitoring 

the inference process via measures such as entropy or Kullback-Leibler di- 
(^ vergence, one can determine when to stop the data collection process. The 

i-h methodology is applied to select the most informative designs on both a sim- 

ulated data set and on an experimental data set, previously published in the 
literature. It is also shown that the sequential Bayesian analysis used in the 
experimental design can also be useful in detecting conflicting information 
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between measurements and model predictions. 
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1. Introduction 

The paper examines the problem of optimal data collection in order to 
calibrate a graphite nitridation model. The calibration of the model is done 
in the context of Bayesian framework, where the probability distribution of 
the uncertain parameters is inferred from observations using Bayes' theorem. 
Here, the goal of the experimental design is to select the optimal values for 
the control variables which can be tuned by the experimentalist such that 
the post-experimental uncertainty in the model parameters is reduced. 

Since pre-experimental decisions have to be made before any measure- 
ments are taken, it is convenient to frame the design problem also in the 
Bayesian framework, in order to average over the unknown future observa- 
tions. A comprehensive and unified view of Bayesian experimental design is 
given in Chaloner and Verdinelli (1995). In this paper, the optimal design is 
found to maximize the expected utility when the purpose of the experiment 
is to estimate the parameters of the mathematical model. Thus the utility 
function employed in this work is based on Shannon's measure of informa- 
tion, Shannon (1948), which reflects our goal for parameter estimation as 
described by Lindley (1956). Other utility functions can be tailored when 
the purpose of the experiment is prediction, hypothesis testing or mixed 
objectives including minimizing the financial cost of the experiment. 

The proposed approach is applied to a sequential design when the ac- 
quisition of the data can be done in the context of experiments on-demand. 



Such a sequential approach is described by Fedorov (1972) and Loredo and 
Chernoff (2003) which clearly identify the three stages of the experimental 
process: experiment - inference - design. The data already collected is used 
to update the probability density function (pdf ) of the parameters, and the 
result of this inference can be further used in identifying the next design 
which best resolves our questions of interest in the model. After the exper- 
iment is performed, the cycle continues with a new inference step, followed 
by a design step and so on until the experimental objective is reached. 

Due to the high computational complexity of the Bayesian experimental 
design, its adoption is rather low especially when dealing with nonlinear 
models. A large body of work can be found in the literature on optimal 
design for linear or linearized models under the name of Bayesian alphabetic 
criteria, see DasGupta (1995). However in the recent years a revived interest 
in Bayes optimal designs for nonlinear models can be attributed to Bayesian 
recursive update, efficient estimators of information-theoretic measures and 
to Markov chain Monte Carlo (MCMC) algorithms which can efficiently 
sample complicated posterior distributions. In Muller (1999), the author 
makes use of MCMC to create a new simulation-based design, approach 
which jointly samples from an artificial probability model on the design, 
data and parameters. 

By exploiting the independence property of the noise from the design, 
Sebastiani and Wynn (1997) extended the maximum entropy sampling pro- 
posed by Shewry and Wynn (1987) to estimation problems. They show 
that the experiment which provides the maximum amount of information 
for model parameters is the one for which the predictive distribution has the 
largest entropy. In other words, the most informative experiment is the one 
where we know the least. A similar information-theoretic approach is pre- 



sented by Farhang-Mehr and Azarm (2002), which maximizes the entropy 
of Gaussian priors. These methods are aligned with previous ideas found 
in Lindley (1956) of incorporating information-theoretic approaches in the 
experimental design process. 

The main contribution of this paper is to emphasize the intuitive inter- 
pretation of the Bayesian experimental design as an information-theoretic 
sensitivity analysis, methodology which is applied to an engineering prob- 
lem for which real experimental measurements exist. It is shown that the 
optimal design for estimation problems is the one which maximizes the mu- 
tual information of the parameters and the future observations. Given the 
connection between mutual information and copula functions used to model 
statistical dependencies, see Calsaverini and Vicente (2009), this new inter- 
pretation of Bayesian experimental design reveals that optimal sampling for 
parameter estimation can be yielded by an information-theoretic sensitivity 
analysis, see also Appendix B. Thus the optimal design occurs where the 
statistical dependence between observables and parameters is maximized. 
In contrast to the maximum entropy sampling, no assumptions are made 
about the functional dependence of the entropy of conditional distribution 
on the design. 

In the inference stage, we use an adaptive MCMC algorithms proposed 
by Cheung and Beck (2009) to obtain samples from the posterior distri- 
bution of model parameters. Estimators based on fc-nearest neighbor are 
used to compute the information theoretic measures required in the design 
stage. The use of these estimators is advantageous when only samples are 
available to describe the underlying distributions. The mutual information 
is estimated using Kraskov's approach, see Kraskov et al. (2004), which 
extends the fc-nearest neighbor based estimator for differential Shannon en- 



tropy developed by Kozachenko and Leonenko (1987). 

While papers on experimental design can be found in geoscience - Guest 
and Curtis (2009), neuroscience - Paninski (2005), biomedical applications - 
Clyde et al. (1996), Chung and Haber (2011), Horesh et al. (2011), engineer- 
ing - Tucker (2008), just to name a few, the application of Bayesian design 
principles to actual experiments still lags far behind the theoretical advance- 
ments, see Chaloner and Verdinelli (1995). In Curtis and Maurer (2000), 
the authors compare the small number of papers published on average per 
year on experimental design to the large number of papers published on in- 
verse methods, emphasizing the disconnect between the amount of data and 
the amount of information contained in the data. A more efficient learning 
of model parameters can be accomplished by using Bayesian experimen- 
tal design which tightly couples the computational modeling, experimental 
endeavors and data analysis. 

In this work, we consider an experiment for the nitridation of graphite 
conducted by Zhang et al. (2009). The main objective of the experiment 
is to measure the reaction rate of the graphite with active nitrogen. This 
quantity is of great importance for assessment of the effectiveness of the 
thermal protective system of space vehicles. The main parameter of interest 
is the reaction probability of the graphite nitridation reaction. The values 
of this quantity presented in the literature (Park and Bogdanoff (2006), 
Goldstein (1964), Suzuki et al. (2008), Zhang et al. (2009)) vary by several 
orders of magnitude. The experiments reported in Zhang et al. (2009) have 
a detailed description of the scenarios with readily available data. The 
authors conducted a series of runs at different conditions that are not based 
on experimental design considerations. Hence the effectiveness of a rigorous 
experimental design strategy can be demonstrated. 



In Section 2 the problem of optimal experimental design is described 
in the Bayesian framework. The experimental setup for the nitridation of 
graphite is detailed in Section 3, followed by the model description in Section 
4. The numerical results for both simulated data and real experimental data 
are presented in Section 5, and the concluding remarks are given in Section 6. 

2. Description of the experimental design 

Given a set of observations D n —\ = {di, d2, . . . , d„_i} we are concerned 
with finding the next experimental design £ n £ S such that the model 
parametric uncertainty is reduced after the experiment is performed and 
the associated measurement data d n is collected. The mathematical models 
used in this paper are generally represented by the following abstract model: 

r(u,0,£,e s ) = (1) 

d = d(u,0,£,e m ) (2) 

Here u is the state of the system which obeys the governing equations de- 
fined by r(-), £ G H is the control variables associated with the experimental 
scenario, 3 is the design space and 6 G Q are the model parameters, where 
is the parameter space of the model. Here, the model prediction d„ calcu- 
lated using the measurement model d(-) is comparable with the experimental 
data d„ for a particular scenario input £ n . The random vector or random 
field e s captures any stochastic forcing present in the governing equations 
such as unmodeled dynamics and e m models the discrepancy between model 
predictions and experimental data. 



2.1. Bayesian experimental design 

In the followings it is assumed that we can afford to perform up to 
N experiments to obtain the desired measurements. Preferably, one will 
want to perform as few informative experiments as possible in order to 
calibrate the model. The experimental design process employed in this paper 
generates an optimal sequence of designs such that the information gain is 
maximized after each experiment. More about this sequential process of 
designing experiments can be found in Fedorov (1972). Note that since this 
decision process is sequential, and the next design depends on the previous 
experiments, it does not guarantee to find the optimal N designs among 
all possible combinations of N designs. This would correspond to a batch 
update of the prior parametric uncertainty and an expensive combinatorial 
decision process, which would require a large number of Monte Carlo samples 
to estimate the necessary expected utilities for each possible combination of 
designs. 

The entire experimental process is divided in three stages: experiment 
stage, inference stage and, design stage. In the experiment stage, new data 
is collected according to the strategy obtained in the previous design stage. 
Initially, this data can also be obtained from any experiments documented 
in the literature that are related with the underlying modeling problem. In 
the inference stage, the newly obtained experimental data is used to update 
the prior pdf of the model parameters using Bayes rule. The resulting pdf 
is further used to predict the distribution of the future observations for a 
variety of scenarios which cover the design space. The best design is chosen 
by maximizing the expected Shannon information gain, expectation taking 
with respect to the distribution of future data. The experimental process is 
depicted in Fig.l. 



2.2. Bayesian inference step 

The inference stage consists in applying Bayes's theorem to calculate the 
distribution of the model parameters conditioned on all the available data, 
p{6\D n ). Using the conditioning rule, one can derive the following recursive 
Bayesian update: 

mDn) = p(D n ) 

p(d ra ,.D ra -i|g)p(fl) 

p(d n ,D n _i) 
p(d ra |£) w -i,g)p(D n _i|g)p(0) 
p(d n \D n -i)p(D n -i) 

p(d n \D n ^,0,UP^\Dn-l) (3) 

p(d„|D n _i,£J 
Under the assumption of conditionally independent measurements given 
model parameters, then the above expression can be simplified using the 
following relation: p(d n \D n -i,6,£ n ) = p(d n \d, £ n ). The denominator is 
given by, 

Kdnl £>„_!, £J = / p(d n \e, i n )p(0\Dn-l)&0 (4) 

Je 
and quantifies the evidence provided by the new experimental data, d n , in 

support of our model conditioned on all the previous measured data. The 
use of Bayesian inference in the sequential experimental design is advanta- 
geous as it permits the iterative accumulation of information. The posterior 
distribution obtained in this stage becomes the prior distribution in the next 
stage of inference. After all the N experiments have been performed, the 
last posterior distribution summarizes the information contained in all the 
available measurements. 

Instead of performing iV experiments, the data collection process can be 
stopped earlier, when precision measures computed in the interim stages, 



such as the determinant of the sample covariance matrix, satisfy the thresh- 
olds predefined by the user. The process can also be ceased when the rate 
of reducing the uncertainty in the parameters has slowed enough or when 
indication exists that model predictions and the corresponding experimental 
observations are in disagreement. 

The inverse problem of calibrating the model parameters from the mea- 
surement data is solved using Markov chain Monte Carlo simulations. In 
our simulations, samples from the posterior distribution 6 ~ p(0\D n ) are 
obtained using the statistical library QUESO Prudencio and Schulz (2011) 
equipped with the Hybrid Gibbs Transitional Markov Chain Monte Carlo 
method proposed by Cheung and Beck (2008, 2009). 

2.3. Optimal experimental design step 

According to Lindley (1956) when the objective of the experiment is to 
learn about the model parameters 0, then the utility function is given by 
the amount of information provided by the measurement d n as result of 
performing the experiment with the design £ n . Since d n has not yet been 
observed, and we have to make a decision regarding the design £ n prior to 
the experiment, in the following, we are using the unknown measurement 
d n and its corresponding predictive distribution. 

C/(d n ,0 = f p(0\d n , £> n _i) logp(0|d n ,D n _i)d0 
J® 

- f p(e\D n ^)\ogp(e\D n ^)d6 (5) 

J® 

Here, the prior pdf for model parameters is given by p(0\D n -i), and it is 
used to compute the following predictive distribution for the observables 
when the input scenario is £ n : 

p(d n |D n _i,0= [ p(d n \9,$ n )p(0\Dn-i)de (6) 

Je 



Since we are in the pre-experimental stage, we can compute the average 
amount of information provided by an experiment £ n by marginalizing over 
all the unknown future observations: 



E d „[E/(d n ,£ n )] = / [/(d n ,Op(d„|D n _!,Odd n 
Jv 



(7) 



The optimal experiment is obtained by solving the following optimization 
problem: 

£; = argmaxE dn [[/(d„,0] (8) 

By substituting Eq.(5) into Eq.(7), the expected utility can be written 
in the following form: 

E dn [u(d n ,u] = I I p(^,d n |D w _ 1 ,^)iog p( ^ d "Jf"- 1 ;^ ) d0dd ra 

JvJ® P( d n|Ai-l,«J 

-/ f p(6>,d n |D n _i,^ n )logp(6>| J D n _i)d6>dd„ (9) 

Jv J® 

= / / p(0,d n |i? n _i,| n )log— — ^^d^dd,, 

= I^jdnl^-i,^) (10) 

Therefore the optimal experiment is the one which maximizes the mutual 
information between the model parameters and the model predictions. In 
other words we would like to sample where the expected observations have 
the highest impact on model parameters. 

£* =argmaxI(0;d n |.D n _i,£ n ) (11) 

From information theory, see, for example Cover and Thomas (1991), 
mutual information quantifies the reduction in uncertainty that knowing 
the model parameters provides about the model predictions and vice-versa. 
Being written as the Kullback-Leibler divergence between the joint distri- 
bution of parameters and predictions, and the product of their marginals, 
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we can also say that mutual information provides a measure of statistical 
dependence between the two random variables. This has been shown more 
formally in Calsaverini and Vicente (2009) by making the connection be- 
tween mutual information and copula functions which are used to model the 
dependence between random variables. For completeness this connection is 
also presented in Appendix B. It turns out that the mutual information is 
just the negative copula entropy. Hence, mutual information is indepen- 
dent on the marginal distributions and it quantifies only the dependence 
information contained in the copula function. 

In neuroscience, Paninski (2005) uses the same Bayesian optimal design 
based on maximizing the mutual information between model parameters and 
model predictions. It is shown that the information-maximization sampling 
is asymptotically more efficient than an independent and identically dis- 
tributed sampling strategy. Compared with maximum entropy sampling for 
parameter estimation proposed by Sebastini and Wynn (2000), the sampling 
based on maximizing the mutual information is more general, no assumption 
about the independence assumption is made about the discrepancy model 
with respect to the design. 

Finally, in the design step, the experimental space S is discretized and 
the corresponding marginal distributions of model predictions and parame- 
ters, as well as the joint pdf of the two are computed. The mutual informa- 
tion corresponding to each experimental design is calculated as in Subsection 
2.4, and the design corresponding to the highest MI is selected for the next 
experiment. Here, samples from the joint distribution of parameters and 
model predictions are obtained using Monte Carlo simulations according to 
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the following relation: 

p{d n ,O\D n - 1 ,S n )=p(d n \0,S n )p(0\D n - 1 ) (12) 

2.4- Estimating the mutual information 

The estimation of the mutual information in this paper is done using the 
/cNN (Nearest Neighbor) method proposed by Kraskov et al. (2004). This is 
based on the ANN estimator for the entropy proposed by Kozachenko and 
Leonenko (1987). The entropy of a random variable can be approximated 
from N samples using Kozachenko-Leonenko's estimator: 

, N 

Hpo « -ip(k) +m) + ^Yl lo § e « ( 13 ) 

i=l 
where ip(-) is the digamma function, dx is the dimensionality of the random 
variable X, and e(i) is the distance from sample Xi to its fcNN, Xy{\, and 
is given by the maximum norm: 

e(i) = WXi-X^Woo (14) 

This is a biased estimator due to the assumption of local uniformity of 
the density. By definition, the MI can be decomposed as the sum of marginal 
entropies minus the joint entropy. Therefore, the above estimator can be 
used to approximate all three entropies and thus the mutual information. 
However the errors made in estimating individual entropies will not cancel 
out in general. To partially alleviate this problem, Kraskov's approach is 
not to fix k when estimating the marginal entropies. It can be shown, see 
Kraskov et al. (2004), that the MI of the parameters and model prediction 
can be computed as follows: 

1 N / \ 

1(0; d„| £>„_!, U w i>{k) - - J2[ ^(Mi) + 1) + 1>{n d (i) + 1) J + i/;(N) (15) 
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where nd(i) and ng(i) are the number of points in the marginal space within 
e(i) distance from the ith sample. Here, e(i) is the distance from the ith 
sample to its A;NN in the joint space. 

Based on numerical studies, Khan et al. (2007) shows that the estimators 
based on /cNN and kernel density estimation (KDE) outperform commonly 
used dependence measures such as linear correlation, cross-correlogram or 
Kendall's r. Furthermore, it is also shown that in general the /cNN esti- 
mator captures better the nonlinear dependence than the KDE. However 
the Kraskov's MI estimator has to be used with care. A small value for k 
will result in a small bias but a larger variance and vice-versa. Also, the 
efficiency of the estimator decreases as the dimensionality of the joint space 
increases. 

For completeness, an example test has been carried out to show the per- 
formance of the /cNN estimator for the MI as function of the dimensionality 
for the joint space, the number of the samples and the number of /cNN's. 
The joint pdf of two random variables of known dimensions is modeled us- 
ing a Gaussian density function for which we can compute the exact MI. 
For each combination of joint dimensionality, fcNN, and sample size, a num- 
ber of 1000 trial runs have been used to generate samples according to the 
Gaussian density function. The entries in the covariance matrix have been 
considered to be 1 on the diagonal and 0.9 for the off-diagonal elements. 
Using the true value of the MI and the estimated value, one can compute 
the relative absolute error for each run. The result of the average absolute 
relative error over the 1000 trial runs is shown in Table 1. 
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2.5. Example 1: Simulation experiment 

Consider the following nonlinear design problem: 

d = 0i£ + 2 £ 2 + /(Oe (16) 

The goal of the experiment design in this example is to efficiently learn 
the model parameters, 6\ and #2, when the design space is given by 5 = 
[—1,1]. The discrepancy model is given by /(£)e, where e is a standard 
normal random variable, JV(0, 1). The two parameters are uniformly dis- 
tributed according to 9\ ~ U{\,2) and #2 ~ W(3, 4). Two cases are ana- 
lyzed: additive noise when /(£) = 0.01 and multiplicative noise with respect 
to the design when /(£) = 0.5(1.1 — |£|). Given the initial uncertainty of 
the parameters and the discrepancy function, the response of the model it is 
shown in Figs.2(a)-2(b). In this respect we want to compare the maximum 
entropy sampling with the information maximization sampling described in 
this paper. Hence, the two cost functions to be maximized in the design 
stage are, 

• J(£n) = I(($i> $2); d n \D n -i,£ n ) for information maximization (IM) 
sampling 

• J{£n) = H(<i n |.D n _i,£ n ) for maximum entropy (ME) sampling 

The two sampling approaches, IM and ME, are also compared with a 
strategy based on random sampling (RND), which at each stage it randomly 
chooses a design. All the designs being equally likely at every stage. 

In this example we consider that up to 10 experiments can be performed 
with scenario values equally distributed in the design space as indicated 
by the dashed lines in Figs.2(a)-2(b). For each case corresponding to the 
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discrepancy function, additive noise or multiplicative noise, two other cases 
are analyzed with respect to experimental sampling restriction. In the first 
case no repeated measurements are allowed. Thus, once an experimental 
design has been chosen from the set of 10 designs, it can no longer be used 
in the subsequent stages. The second case corresponds to allowing repeated 
measurements in our calibration. Synthetic measurements are created using 
Eq.(16), where the true value of the parameters are set to 9± = 1.5 and 
62 = 3.5. A different noise sample has been generated for each of the 10 
scenarios. For this example we have used 2000 samples in computing the 
MI and the kNN has been set to 6. 

For the two cases considered in this example (additive and multiplica- 
tive noise) each strategy will yield a sequence of designs. At each stage of 
the design, the performance of each strategy is given by the reduction in 
uncertainty of the parameters which is quantified by the entropy of the pa- 
rameter distribution. The entropy is computed here using Eq.(13) and for 
comparison purposes, since we are generating synthetic measurements and 
use a random sampling approach, the entropy is averaged over 100 Monte 
Carlo runs. 

As expected, for the additive noise the maximum entropy sampling 
is equivalent to the information maximization sampling as it is shown in 
Fig. 3(a) and Fig. 3(c). Table 2 shows that in this case the highest initial re- 
duction in uncertainty is obtained by starting the sampling at the boundary 
of the design domain. For multiplicative noise however, due to the discrep- 
ancy model used, the maximum entropy is forced to start the sampling using 
the design points in the middle of the domain, where the uncertainty is the 
highest, see Table 3 and Fig. 2(b). Because of the distribution of the noise, 
the performance of the maximum entropy strategy is worse on average even 
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compared with the random sampling. 

A faster reduction in uncertainty is obtained in this case using the in- 
formation maximization sampling as it is shown in Fig. 3(b) and Fig. 3(d). 
For both additive and multiplicative noise cases, an accurate approximation 
to the true value of the parameters is given by the information maximiza- 
tion sampling using only two experiments as it is shown in Figs.4(a)-4(d). 
Therefore, an efficient learning of model parameters in the general nonlinear 
design problems can be achieved using information maximization sampling. 

Regarding the experimental sampling restriction, in both cases when we 
allow and do not allow repetitive measurements the performance of IM is 
superior when compared with ME and RND, see Fig. 3. When repeated 
measurements are allowed and we have additive noise, both IM and ME 
consistently select the same boundary design points, see Table 2. For mul- 
tiplicative noise and repeated measurements, we have the same consistency 
in choosing the designs, IM selecting the boundary design points while ME 
selecting the middle design points, which in this case are the least informa- 
tive designs. This explains the poor performance of ME compared with IM 
and RND, see Fig.3(d). 

3. A description of experimental setup and data 

The experimental scenario is described in much detail in Zhang et al. 
(2009). For completeness, we briefly describe the experimental set-up, the 
measurements and their associated uncertainties. The experiments were 
performed to measure the reaction probability of the graphite nitridation 
reaction, Cq t + N — > CN. The high-purity graphite (grade DFP-2) in the 
form of 3.175 mm diameter rods was used as the sample. The sample was 
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placed in a furnace-heated quartz tube shown in Fig. 10. A quartz flow 
tube (22 mm inner diameter) extended through a high-temperature tube 
furnace with a hot region of approximately 45.5 cm. Atomic nitrogen was 
generated by flowing nitrogen through a microwave discharge located up- 
stream of the furnace. The microwave discharge was operated at a constant 
power of around 90-95 W under steady-state temperature and gas flow con- 
ditions. The pressure was measured before and after the furnace using a 10 
torr Baratron capacitance manometer. The data sheet from the instrument 
reports uncertainty as ±1% of the recorded value. The temperatures of the 
flow tube were measured using Omega stick-on type K thermocouples at the 
location of both pressure ports and near the entrance and exit of the furnace. 
Temperatures were monitored periodically to ensure steady state operation. 
The gas pressure and bulk velocity in the flow tube were varied by simultane- 
ous adjustment of the incoming N2 flow rate with a throttling valve located 
in the downstream pumping manifold. The uncertainty in the flow rate is 
quoted as being no more than 1%. The mole fraction of atomic nitrogen 
entering and exiting the furnace were measured by gas phase titration with 
nitric oxide. In this method, controlled quantities of nitric oxide (NO) is 
introduced into the titration region. The NO reacts extremely rapidly with 
atomic N and the disappearance of the NO signals the titration end point. 

The nitrogen atoms that strike the graphite surface react with the graphite 
sample and cause mass loss. Each sample was weighed immediately before 
it was placed into the flow tube and immediately after it was removed from 
the flow tube, using a Mettler Toledo XP105 analytic balance with a 0.01 
mg resolution. Control experiments were run at various conditions without 
the microwave discharge to measure the extra mass loss that occurs due to 
de-volatilization of some residual hydrocarbons in the sample or reaction 
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with oxygen leaking into the flow system. The measured mass loss was ad- 
justed by subtracting the control mass loss. The mass loss rate was obtained 
by dividing by the time span of the experiment. The most extensive set of 
measurements have been performed at furnace temperature of 1273 K. The 
measurements that are required for the calibration of various parameters 
described in Section 4 are pressure exiting the furnace, and the mass loss of 
the sample. 

4. Model description and uncertain parameters 

In this section, we present the model that is used to compute the mea- 
sured data as a function of model and control parameters, the system defined 
abstractly in Eq. (1) in Section 2. The experiment (Zhang et al. (2009)) 
is briefly described in Section 3. Since the focus of this work is not in the 
detailed physical modeling but in the investigation of most informed exper- 
imental design strategy, we use an extremely simplified model with several 
assumptions proposed by Zhang et al. (2009). The flow is assumed to be 
one dimensional and given by the Hagen-Poisseuille flow model with tem- 
perature varying density given by the ideal gas law. The bulk flow consists 
entirely of nitrogen as the concentrations of atomic nitrogen and cyanates 
are negligibly small. 

Temperature is assumed to not vary with tube radius. The temperature 
along the cross section of a tube is the same as the wall temperature. The 
wall temperature is linearly interpolated between known temperature values 
along the tube length. The extra pressure drop due to the sample holder 
is taken into account by using an effective tube diameter in the Hagen- 
Poisseuille flow model. With these assumptions the pressure, mean velocity 
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and density profiles along the tube are obtained by solving: 

p dP _ 128m N2 ii N2 RT(z) 

dz ~ nd± ff W N2 { ' 

v ( z ) = ( \ J ( 18 ) 

P(z = 0) = Pi; u(z = 0) = I7i; T(z) = ^(z) (20) 

In the above P{z) denotes the temperature as a function of length along 
the tube z, v(z) is the bulk flow velocity, rriN 2 is the mass flow rate of N2, 
p(z) is the density of N2, T{z) is the temperature of the wall and also the 
fluid along the tube given by interpolating measured temperatures (^f(z) is 
the linear interpolant), d e fj is the effective diameter, Wn 2 is the molecular 
weight of N2, Pin and Ui n are the inlet pressure and velocity respectively 
and R is the universal gas constant. In Eq. (17), Sutherland's model is used 

(nr 1 j_Q \ / \ 3/2 

r ^ f +s — J ( Tj, — J . The constants 

appearing in Sutherland's model are also assumed to be known precisely. 

They are: p r ef,N 2 = 0.01781e — 3 Pa s, Sn 2 = 111-0 and T re f = 300.55 K. 

We also need the values of the nitrogen atom concentration as a function 

of z. The nitrogen atom concentration is also assumed to not vary with the 

tube radius. This means that diffusive processes for the N-atom as well 

as the temperature are assumed to be instantaneous. The assumption is 

justified due to the small radius, reasonably high speed flow and relatively 

slow wall recombination reactions. With these assumptions, the iV-atom 

concentration profile along the tube is obtained by solving: 

d(v(z)C N ) -f N (T)v N (T)C N 9h(T , n r 2 , on 

dz = d^f M(T)C N2 C N (21) 

C N (z = 0) = CW,i (22) 
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Cn is the molar concentration of atomic N, 77V is the reaction coefficient 
for reactions with the quartz wall defined below in Eq. (24), k(T) is the 
reaction rate for gas phase recombination of N atoms defined in Eq. (23), 
vn(T) is the thermal velocity of N atoms given by kinetic theory defined 
below in Eq. (25), CV 2 is the concentration of N2 gas. The first term in the 
right hand side models the loss of N atoms as a result of recombination at 
the wall, the second term accounts for the loss of N atoms due to reaction 
in the gas phase to produce molecular nitrogen AT 2 . The models/definitions 
for the varies quantities introduced are: 

EnNN 



k(T) = Annzxp 



RT 



(23) 



TV CO - -,v(7;,/)(^) (24) 



We assume that sublimation is negligible. The only reaction is a global 
first order reaction between the solid graphite and nitrogen atom to give 
CN gas: C(Gr) + N — > CN. In the chosen model, called the Gas Kinetic 
(GK-) model, the transport of the nitrogen atoms to the surface is assumed 
to be instantaneous. The backward reaction is ignored. Hence for the GK - 
model, the mass loss of carbon is given by the following: 

AM C = {Atnd sample W c ) (f' S C N (z)^/f^)(3 N dz\ J ^- (26) 

The main quantity of interest for this experiment is the nitridation coefficient 
/Sjv- ^ is the time interval of the test, d samp i e is the diameter of the sample, 
Wq is the molecular weight of carbon, z s is the sample location, L s is the 
length of the sample and Cjv(z) is the molar concentration of N along the 
sample obtained by solution of Eq. (21). In Eq. (26), only the forward 

20 



equation is considered and the wall concentration of N is the same as the 
free stream concentration. 

In the following, the mass loss and the output pressure d n = {Amc, P2}, 
are used as the observables. They can be obtained as a solution of equations 
(17) and (26) along with the supplementary equations (18) to (25). From 
a review of the literature, we choose the most uncertain parameters to be 
= {d e //,7AT,/0iv}- The stochastic forcing term e s in Eq. (1) is neglected. 
The discrepancy term e m in Eq. (2) is a Gaussian random variable whose 
95% confidence interval is taken to be the estimated error quoted in Zhang 
et al. (2009), and it is used here to define the likelihood function in the 
Bayesian inversion. The control variables are the volumetric flow rate, inlet 
pressure and the inlet N atom mole fraction : £ n = { Fjy 2 ^ n , P±, Xn,i } 
from which the mass flow rate, mjv 2 and inlet nitrogen atom concentration 
Cjv,i defined in Eqs. (18) and (22) can be computed. 

5. Results 

To recapitulate, the most uncertain parameters appearing in the model 
formulation described in Section 4 are the effective diameter d e ff, the reac- 
tion efficiency 7 at and, the nitridation coefficient /3n- To learn the model 
parameters 6 = {d e //,7Ar,/3jv} the mass loss and the output pressure are 
used as observables, d n = {Amc,P2}- In the followings we present the 
experimental design analysis on both simulated data and real experimental 
data as given in Zhang et al. (2009). For these examples, we have used 
1000 samples per level in our adaptive MCMC and 2000 samples at the last 
level. These 2000 samples have also been used in the forward uncertainty 
propagation and in the MI calculation. We have set the A;NN to 6 in the 
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following results. 

5.1. Example 2: Simulated measurements 

In this example we use the model presented in Section 4 to generate 
synthetic measurements for pressure and mass loss by adding noise to model 
predictions using 1% uncertainty for pressure and 5% uncertainty for mass 

loss. To generate the measurements the following values have been used for 

de ff -nat; l™- / 2W -in onri Av 



the normalized parameters: , e// = 0.95, log -^- = 1.0 and, -§^- = 2.0, 
where the nominal values are given by: rf e //o = 0.022, 7jvo = 10~ 3 , and 
P m = 3.29 x 10 _ 3. 

The values for the experimental scenarios (-Fjv 2 ,m) Pi, %n,i, At, L$) are 
the same as in Zhang et al. (2009), and presented in Fig. 7. Similar with the 
previous example three strategies are used to select the optimal sequence of 
designs from the set of 28 designs: maximum entropy sampling, information 
maximization sampling and, ascending sampling which starts with the first 
tabulated design in Zhang et al. (2009) (enumerated in Fig. 7 from E0 to 
E27), and continues providing the next available one at each stage. 

Fig. 8(a) shows the drop in uncertainty versus the number of design 
stages. The design sequences provided by both maximum entropy sampling 
and the information maximization sampling, see Table 4, give a comparable 
learning rate, which is superior to the ascending sampling. Fig. 6 gives the 
relative entropy for ME sampling and relative mutual information for IM 
sampling at each design stage. The difference between the information the- 
oretic approaches and the ascending sampling is that the former ones give 
higher priority to the designs with high inflow. 

ME sampling preferably selects the experimental scenarios with large 
values for inflow conditions, either Fjv 2) j n , Pi, Xn,i- For example, first the 
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scenario of 14 measurement has the largest inflow rate and pressure among 
others. It follows the same type of experimental scenario. A physical ex- 
planation for this trend could be hypothesized as follows. A large value 
of F]\f 2t i n results in the large pressure drop through the experimental re- 
gion. This enhances the sensitivity of pressure on d e ff through Eq.(18). 
Also, a large pressure (e.g., 14 th , 6 th measurements) accelerates chemical 
reactions especially the gas-phase recombination through the three-body 
reaction: N + N + N 2 <->• N 2 + N 2 . The significance of 20 th and 21 s * mea- 
surements preferably selected in the early stage is related to the combination 
of the large volumetric flow rate and relatively small pressure, resulting the 
large velocity. This reduces the residence time of flow and consequently has 
a large influence on the determination of N-atom concentration. On the 
contrary, the scenarios with the small values of inflow conditions (e.g., th , 
1 st measurements) are selected in the later stages. 

On the other hand, IM sampling prefer alternately selecting the exper- 
imental scenarios of the large values and small values for inflow conditions 
after the first 3 scenarios. These difference is clearly seen in the different 
order of selecting and 1 st measurements. It seems to the authors that 
IM sampling performs in a more logical manner by mixing a variety of infor- 
mation. This can be seen in Fig. 7 which provides a graphical representation 
for the inflow conditions preferred by both IM and ME. 

The marginal pdfs of the parameters at different stages are plotted in 
Fig. 9. Note that both information theoretic approaches provide a better 
determination of the model parameters than the ascending sampling ap- 
proach. While including all the 28 measurements we are able to recover 
with good accuracy the model parameters, information maximization can 
provide an adequate estimate after the first two stages, followed closely by 
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the maximum entropy sampling. 

Kullback-Leibler (KL) divergence between the pdfs obtained at the cur- 
rent stage and the previous one is also included in order to monitor the 
evolution of the inference process, see Fig. 8(b). The KL divergence is cal- 
culated using also a /cNN estimator as described in the Appendix A. While 
the entropy quantifies how certain we are about the model parameters, the 
KL divergence quantifies how much information has been gained by adding 
the current measurement. This includes the reduction in uncertainty as well 
as a change in the support of the pdf. Decreasing entropy and large values 
of KL are indicative that the model is still learning. This is equivalent with 
obtaining tighter pdfs at each stage however their high probability support 
is changing from one stage to another. A sequence of low entropies as well 
as low KL divergence indicates that the learning process has slowed down 
and one can stop the experimental process. In this example, by thresholding 
the KL at 0.5 nats, depending on the level of uncertainty, given here by the 
entropy, one can stop the experimental process at any stage greater than 5. 
Thus for simulated data not all the 28 measurements are needed to recover 
the model parameters with a given level of accuracy. 

5.2. Example 3: Study case on existing experimental measurements 

The same analysis as in the previous example is now carried on real 
experimental data as provided in Zhang et al. (2009). As in the simulated 
measurement case, both the information theoretic approaches give design se- 
quences which start with high flow designs. See Table 5 for the corresponding 
design sequences or Fig. 11 for the relative expected utility corresponding 
to the two information theoretic strategies at each design stage. The profile 
of the uncertainty reduction showed in Fig. 13(a) is similar with the one in 
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Fig. 8(a). However, the same cannot be said about the KL divergence plot- 
ted in Fig. 13(b). No matter which design strategy is used, each subsequent 
measurement brings new information which moves the high probability sup- 
port of the pdf from one stage to another, see Fig. 14 for marginal pdfs at 
different design stages. As mentioned in the previous section, the model 
discrepancy is assumed to be Gaussian and thus in this case the likelihood 
function has nonzero tails. Given that the prior is uniform, the subsequent 
posterior pdfs will have nonzero tails over the support of the prior. 

When compared with the simulated data, this behavior is indicative 
of conflicting information between experimental measurements and model 
predictions. In this case the posterior distribution asserts that the high 
probability support of both the prior and the data are highly implausible, 
for more information see O'Hagan and Forster (2004). When dealing with 
real data this is not unexpected given that most of the time we have model 
error or some of the measurements are not self-consistent. In other words, 
either different biases exist in the measurements or the model along with 
the probabilistic assumptions have limited explanatory power of the real 
phenomenon, or both model errors and corrupted measurements can exist 
at the same time. In such situations, the experimental process should be 
stopped after a pre-determined number of experiments to find the cause 
of the disagreement. While one can use a more direct way to check for 
this disagreement, the authors believe that using the KL divergence in this 
settings can provide another useful tool for model calibration diagnostics. 

Generally speaking, the order of measurement sequence picked by ME 
sampling is very close to one with the simulated data case. In other words, 
it picks the scenario based on the large values of either F]\f 2t i n , Pi, Xn,1i see 
Fig. 12. However, IM sampling picks the experimental scenarios in a slightly 
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different way from ones with the simulated data. For example, with the real 
data, IM sampling prefers the large values of inflow conditions in the early 
stage and does not alternatively pick different scenarios until 10 t/l design 
stage. Note 1 st measurement is selected at the second stage, however the 
choice of measurement is further delayed. 

As mentioned above, the large values of the inflow scenario parameters 
are more informative due to the abundance of nitrogen atoms at the surface 
of the sample. However, when the real experimental data is used, this kind of 
data can be problematic because the physical model error and experimental 
uncertainty tend to be magnified in such a condition. 

In Fig. 14, the pdfs of d e ff from ME sampling stay close to the lower 
bound until 24 t/l design stage. However, the pdfs from IM sampling start 
to shift toward the final estimate earlier (say, after it considers 26" 1 mea- 
surement that has a quite small volumetric flow rate). This explains why 
the sampling in ascending order considering th and I s * measurements first 
does the better job for this parameter. For the other parameters, 7jv and 
/3tv ; the large values of the inflow scenario parameters are preferable for get- 
ting more information due to enhancement of the chemical reactions. In the 
middle column of Fig. 14, the posterior pdfs of 7^ from IM sampling and 
ME sampling appreciably varies from Stage 6 (Fig. 14(b) to Stage 12 (Fig. 
14(e)). 

This can be partially understood by the experimental data uncertainties 
related to 13 measurement and 27 measurement. The authors recognize 
these measurements are problematic and probably bring quite different in- 
formation on this parameter. The scenarios of these experiments are almost 
identical. Nevertheless, the observed N-atom concentration is quite different 
(i.e., Xn,13/xn,27 ~ 1-6). As a result, the measurements of the mass loss 
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that are used for calibration significantly differ each other. In fact, the local 
peaks seen in Fig. 13(a) at 9 design stage from ME sampling and at 7 
design stage from IM sampling corresponds to these experiments. Moreover, 
the knurling feather seen in Fig. 13(b) appears to be related to some incon- 
sistency in the experimental measurement. For example, for ME sampling, 
the first peak around A th and 5 th design stage are related to 13 i/l and 6 th 
measurements. The problem can be substantiated by a simple analysis as it 
follows. From Eq. (26), we get the following relationship in an approximate 
manner: 

AM C « AtC N « AtP~x N , (27) 

where * is the averaged value along the sample and xn is the mole frac- 
tion (= CnRT/P). Using the linear interpolation for the pressure and 
concentration, we can roughly estimate xn an d P by averaging the ex- 
perimental measurements of P and x at two measurement locations. The 
estimated ratio of the mass loss for these experiment by Eq. (27) is given by 
AM[j q/ AM" c,i3 ~ 0.9. This is significantly different from the actual mea- 
surement, 0.7. The same kind of analysis explains the first peak appearing 
at 3 and 4 design stage, corresponding to 20 and 12 measurements, 
for IM sampling. (AM' C 2 q/ AM" c,i2 ~ 0.35 that is much larger than the 
measurement, 0.18). These simple analysis illustrates different experimental 
data provide the model with different information. As a consequence, Fig. 
13(b) is significantly different from Fig. 8(b) where the simulated data is 
perfectly consistent with the model itself. 
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6. Conclusions 

In this paper the experimental design problem is formulated in the 
Bayesian framework, and it is shown to be equivalent with an information 
theoretic sensitivity analysis between model parameters and observables. 
The optimal design selected by the information maximization sampling is 
the one which provides the highest statistical dependence between the two 
random variables. The statistical dependence is quantified using mutual in- 
formation which is approximated in this paper using a k nearest neighbor 
estimator. Theoretically the information maximization sampling is more 
general than the maximum entropy sampling and more efficient than ran- 
dom sampling. 

The entropy and Kullback-Leibler information metrics are used to mon- 
itor the learning process. A decreasing trend in entropy is indicative that 
the uncertainty in model parameters is reduced by assimilating each addi- 
tional measurement. The information gain brought by each measurement 
is quantified by the Kullback-Leibler divergence. Large values are indica- 
tive that the model is still learning. A large divergence between subsequent 
posterior distributions might be explained by a large uncertainty reduction 
or the existence of conflicting information between model predictions and 
measurements. When the trend in the Kullback-Leibler divergence is de- 
creasing one can stop the experimental process given a level of accuracy, 
otherwise one may still choose to stop the experimental process to check the 
formulation of the model, the probabilistic assumptions and/or analyze the 
collected measurements to find potential outliers. 
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Appendix A: Estimating the Kullback-Leibler divergence using 

fcNN 

Similar with the entropy and mutual information, the approximation of 
the Kullback-Leibler divergence is based on a k nearest neighbor approach 
as proposed in Wang et al. (2006). 

B«W*IA.)IWx|iV.) » t f>^f + log ^ (28) 

where dx is the dimensionality of the random variable X, N n and iV n _i 
give the number of samples {-X^|i = 1, . . . , N n } ~ p(x\D n ) and {X^_ ± \j = 
l,...,iV n _i} ~ p(x\D n -\) respectively, and the two distances v n -\{i) and 
p n (i) are defined as follows: 

p n (i) = . min \\Xi-XiWcc (29) 

j=l...N n ,j& 

"n-i(i) = . mm K-X^IU (30) 

j=l...JV„_i 

Appendix B: Information-theoretic sensitivity analysis 

Mutual information is a convenient way to quantify the statistical de- 
pendence between two random variables, X\ and X 2 - 

I(X 1 ,X 2 )= ff(x 1 ,x 2 )\og f f iX l) X2 ] r dxidx 2 (31) 

J fx 1 {Xl)fx 2 {X2) 

A more formal relation between statistical dependence and mutual in- 
formation can be shown with the help of copula functions as presented in 
Calsaverini and Vicente (2009), and included here for completeness. Sklar 
(1959) proved that the joint distribution of two random variables, F(xi,x 2 ), 
can be represented using a copula function, C : [0, l] 2 — > [0,1], and the 
marginals distributions, Fx t ' 

F(xi,x 2 ) = C(F Xl (xi),Fx a (x2)) (32) 
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Given that Fx 1 and Fx 2 are continuous, then Fx 1 (X\) and Fx 2 (X 2 ) are 
two uniformly distributed random variables. Thus, the copula function can 
be regarded as a joint cumulative distribution of two uniformly distributed 
random variables. It separates the study of the dependence between random 
variables and their marginal distributions. By denoting m = Fx i (xi), Eq. 
(32) can be written as follows: 

F(F x l( Ul ),F x l(u 2 )) = C( Ul ,u 2 ) (33) 

Assuming that the densities fx t exist, then the joint probability density 
function is given by: 

= c(Fx 1 {x 1 ),Fx a (x 2 ))fx 1 (x 1 )fx a {x2) (34) 

where c(F Xl (xi),F X2 (x 2 )) = dx % X2 C(F Xl (ffi), F x . 2 (x 2 )) is the copula den- 
sity. 

By substituting Eq. (34) in Eq. (31), and using that Ui = Fx^Xi), the 
mutual information can be rewritten in terms of copula density function. 



I(X 1 ,X 2 ) - J c(Fx 1 (xi),Fx 2 (x 2 ))fx 1 (xi)fx 2 (x 2 )logc(Fx 1 {xi),Fx 2 {x 2 ))dxidx 2 
c(ui,u 2 )logc(ui,u 2 )duidu 2 (35) 



Eq. (35) reveals that mutual information is the negative copula entropy. 
Thus it does not depend on the marginal distributions, it only quantifies the 
dependence between the random variables which is contained in the copula 
function. This has been pointed out in literature in Calsaverini and Vicente 
(2009). 



30 



Acknowledgments 

We thank Dr. Jochen Marschall from SRI International for valuable dis- 
cussions regarding the nitridation experiment. This material is based upon 
work supported by the Department of Energy [National Nuclear Security 
Administration] under Award Number [DE-FC52-08NA28615]. 

References 

K. Chaloner, I. Verdinelli, Bayesian Experimental Design: A Review, Sta- 
tistical Science 10(3) (1995) 273-304. 

C. Shannon, A Mathematical Theory of Communication, Bell System Tech- 
nical Journal 27 (July, October, 1948) 379-423, 623-656. 

D. V. Lindley, On a Measure of the Information Provided by an Experiment, 
Ann. Math. Statist. 27(4) (1956) 986-1005. 

V. Fedorov, Theory of optimal experiments, Academic press New York and 
London, 1972. 

T. Loredo, D. Chernoff, Bayesian Adaptive Exploration, Statistical Chal- 
lenges in Astronomy (2003) 57-70. 

A. DasGupta, Review of optimal Bayes designs, Tech. Rep., Purdue Univer- 
sity, 1995. 

P. Muller, Bayesian Statistics 6, chap. Simulation-based optimal design, Ox- 
ford University Press, 459-474, 1999. 

P. Sebastiani, H. P. Wynn, Bayesian Experimental Design and Shannon 
Information, in: Bayesian Statistical Science, American Statistical Asso- 
ciation, 176-181, 1997. 

31 



M. Shewry, H. Wynn, Maximum entropy sampling, Journal of Applied 
Statistics (1987) 165-170. 

A. Farhang-Mehr, S. Azarm, A sequential information-theoretic approach 
to design of computer experiments, in: AIAA, Atlanta, Georgia, AIAA 
2002-5571, 2002. 

R. Calsaverini, R. Vicente, An information-theoretic approach to statisti- 
cal dependence: Copula information, EPL (Europhysics Letters) 88 (6) 
(2009) 68003. 

S. H. Cheung, J. L. Beck, New Bayesian Updating Methodology for Model 
Validation and Robust Predictions of a Target System based on Hierar- 
chical Subsystem Tests., Computer Methods in Applied Mechanics and 
Engineering Accepted for publication. 

A. Kraskov, H. Stogbauer, P. Grassberger, Estimating mutual information, 
Physical Review E 69 (6) (2004) 066138. 

L. F. Kozachenko, N. N. Leonenko, Sample Estimate of the Entropy of a 
Random Vector, Problems of Information Transmission 23(2) (1987) 9-16. 

T. Guest, A. Curtis, Iteratively constructive sequential design of experi- 
ments and surveys with nonlinear parameter-data relationships, Journal 
of Geophysical Research 114 (2009) B04307. 

L. Paninski, Asymptotic theory of information-theoretic experimental de- 
sign, Neural Computation 17 (2005) 1480-1507. 

M. Clyde, P. Muller, G. Parmigiani, Bayesian Biostatistics, chap. Infer- 
ence and Design Strategies for a Hierarchical Logistic Regression Model, 
Dekker, New York, 297-320, 1996. 

32 



M. Chung, E. Haber, Experimental design in biological systems, in revision 
at SIAM Journal on Control and Optimization, 2011 . 

L. Horesh, E. Haber, L. Tenorio, Large-Scale Inverse Problems and Quantifi- 
cation of Uncertainty, chap. Optimal Experimental Design for the Large- 
Scale Nonlinear Ill-posed Problem of Impedance Imaging, Wiley, 273-290, 
2011. 

M. A. A. Tucker, Application of Design of Experiments to Flight Test: A 
Case Study, in: AIAA, Los Angeles, California, 2008-847, 2008. 

A. Curtis, H. Maurer, Optimizing the design of geophysical experiments: Is 
it worthwhile?, EOS, Trans. A,. Geophys. Un. 81 (20) (2000) 224-225. 

L. Zhang, D. A. Pejakovic, J. Marschall, D. G. Fletcher, Laboratory Inves- 
tigation of Active Carbon Nitridation by Atomic Nitrogen, AIAA Paper 
2009-4251, AIAA, 2009. 

C. Park, D. W. Bogdanoff, Shock- Tube Measurement of Nitridation Coef- 
ficient of Solid Carbon, Journal of Thermophysics and Heat Transfer 20 
(2006) 487-492. 

H. W. Goldstein, The Reaction of Active Nitrogen with Graphite, Journal 
of Physical Chemistry 68 (1) (1964) 39-42. 

T. Suzuki, K. Fujita, K. Ando, T. Sakai, Experimental Study of Graphite 
Ablation in Nitrogen Flow, Journal of Thermophysics and Heat Transfer 
22 (2008) 382-389. 

E. E. Prudencio, K. W. Schulz, The Parallel C++ Statistical Library 
'QUESO': Quantification of Uncertainty for Estimation, Simulation and 
Optimization, Submitted to IEEE IPDPS . 

33 



S. H. Cheung, J. L. Beck, New Bayesian updating methodology for model 
validation and robust predictions based on data from hierarchical subsys- 
tem tests, EERL Report No. 2008-04, California Institute of Technology, 
2008. 

T. Cover, J. Thomas, Elements of Information Theory, New York: Wiley, 
1991. 

P. Sebastini, H. P. Wynn, Maximum entropy sampling and optimal Bayesian 
experimental design, J. R. Statist. Soc. B 62, Part 1 (2000) 145-157. 

S. Khan, S. Bandyopadhyay, A. R. Ganguly, S. Saigal, Relative performance 
of mutual information estimation methods for quantifying the dependence 
among short and noisy data, Physical Review E 76 (2007) 026209. 

A. O'Hagan, J. Forster, Kendall's Advanced Theory of Statistics. Bayesian 
Inference, v.2B, Oxford University Press Inc, New York, 79-80, 2004. 

Q. Wang, S. Kulkarni, S. Verdu, A Nearest-Neighbor Approach to Esti- 
mating Divergence between Continuous Random Vectors, in: Informa- 
tion Theory, 2006 IEEE International Symposium on, 242 -246, doi: 
10. 1109/ISIT. 2006. 261842, 2006. 

A. Sklar, Fonctions de repartition a n dimensions et leurs marges, Publica- 
tions de llnstitut de Statistique de lUniversite de Paris, Paris 8. 



34 



Sample Size — > 


500 


1,000 


2,000 


Dimensionality 


/eNN 


yfcNN 


kNN 


dim(X,Y) 


dim(X) dim(Y) 


3 


6 


9 


3 


6 


9 


3 


6 


9 


2 


1 1 


16% 


10% 


8% 


11% 


7% 


5% 


8% 


5% 


3% 


3 


2 1 


21% 


13% 


11% 


16% 


9% 


7% 


12% 


7% 


5% 


4 


2 2 


15% 


11% 


11% 


10% 


7% 


6% 


7% 


5% 


4% 


5 


3 2 


18% 


14% 


13% 


13% 


9% 


9% 


9% 


6% 


9% 



Table 1: Absolute Relative Error for MI averaged over 1000 trial runs 



Strategy 


Sequence 


Repeated measurements not allowed 


ME Sampling 


10 192385746 


IM Sampling 


1 10 92387465 


RND Sampling 


6 10 31598247 


Repeated measurements allowed 


ME Sampling 


10 1 10 1 1 10 10 1 1 1 


IM Sampling 


1 10 1 10 10 1 10 10 1 1 



Table 2: Results for example 1, the additive noise case. Design sequences yielded by the 
three strategies for one particular Monte Carlo run 
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Strategy 


Sequence 


Repeated measurements not allowed 


ME Sampling 


567413892 10 


IM Sampling 


1 10 97284356 


RND Sampling 


6185 10 94237 


Repeated measurements allowed 


ME Sampling 


5665556565 


IM Sampling 


1 10 1 10 10 1 1 1 1 10 



Table 3: Results for example 1, the multiplicative noise case. Design sequences yielded by 
the three strategies for one particular Monte Carlo run 



Strategy 


Sequence 


ME Sampling 


14 21 6 20 13 12 27 19 5 11 22 4 18 17 10 3 9 16 15 26 2 7 23 8 24 25 1 


IM Sampling 


21 5 20 1 13 27 14 2 12 15 16 11 22 19 9 10 3 4 6 26 25 23 7 8 24 18 17 


ASC Sampling 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 



Table 4: Results for example 2 (simulated data). Design sequences yielded by the three 
strategies. 



Strategy 


Sequence 


ME Sampling 


14 21 12 13 6 4 20 5 27 11 3 22 19 16 15 18 17 10 9 2 26 7 8 23 24 1 25 


IM Sampling 


21 1 20 12 27 14 13 11 19 22 26 9 10 25 23 7 24 8 18 17 5 2 15 16 3 4 6 


ASC Sampling 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 



Table 5: Results for example 3 (real data). Design sequences yielded by the three strate- 
gies. 
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Figure 1: Schematic of sequential experimental design process 
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(a) Initial response (additive noise) 



(b) Initial response (multiplicative noise) 



Figure 2: Results for example 1. Model response given initial discrepancy and parametric 
uncertainty. 
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Figure 3: The evolution of the entropy of parameter distribution averaged over 100 Monte 
Carlo runs for the three strategies: IM - Information Maximization Sampling, ME - Max- 
imum Entropy Sampling, RND - Random Sampling, and the two case: no repeated mea- 
surements and repeated measurements. 
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Figure 4: Results for example 1. Marginal pdfs of model parameters given by the three 
strategies after the second experimental design stage 
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Figure 5: Results for example 2 (simulated data). 3D plot for inflow scenarios used in 
Ref. Zhang et al. (2009). 
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(b) ME: Relative entropy 



Figure 6: Results for example 2 (simulated data). Relative expected utility for Information 
Maximization Sampling and Maximum Entropy Sampling. Darker shadings indicate large 
expected utilities (mutual information for IM or entropy for ME) relative to all available 
scenarios within a design stage. The # sign indicates which scenario has been selected at 
a given stage. Previously selected scenarios are excluded from subsequent analysis which 
is indicated here by white cells. 41 
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Figure 7: Results for example 2 (simulated data). Graphical representation for the inflow 
conditions selected by the two information theoretic strategies. Darker shadings indicate 
large inflow conditions such as flow, pressure, and concentration. 



42 









^IM 




-? 




ME 




-4 


V » A 


-•-ASC 




- 


a 

CD 


-6 


v * 


. 


a. 




i* * 




X 


-8 


i* » 

V i 


_ 




-10 




- 



10 

Jh 



15 



20 



25 



n design stage 



(a) Entropy 



a 

© 
a 

^ 2 

c 

a 

a 




IM 

--ME 
■•-ASC 
-•-0.5 nats 



10 



15 



20 



25 



n design stage 



(b) Kullback Leibler divergence 

Figure 8: Results for example 2 (simulated data). The evolution of the entropy and 
the KL divergence given the posterior distributions yielded by the three strategies for 
experimental design: IM - Information Maximization Sampling, ME - Maximum Entropy 
Sampling and ASC - Ascending Sampling 
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Figure 9: Results for example 2 (simulated data). Marginal pdfs of model parameters 
given by the three strategies after a selected number of experimental design stages 
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Figure 10: Schematic of the experimental setup used in Zhang et al. (2009). Also shown 
in the inset is the sample and sample holder 
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Figure 11: Results for example 3 (real data). Relative expected utility for Information 
Maximization Sampling and Maximum Entropy Sampling. Darker shadings indicate large 
expected utilities (mutual information for IM or entropy for ME) relative to all available 
scenarios within a design stage. The # sign indicates which scenario has been selected at 
a given stage. Previously selected scenarios are excluded from subsequent analysis which 
is indicated here by white cells. 46 
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Figure 12: Results for example 3 (real data). Graphical representation for the inflow 
conditions selected by the two information theoretic strategies. Darker shadings indicate 
large inflow conditions such as flow, pressure, and concentration. 
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Figure 13: Results for example 3 (real data). The evolution of the entropy and the KL 
divergence given the posterior distributions yielded by the three strategies for experimental 
design: IM - Information Maximization Sampling, ME - Maximum Entropy Sampling and 
ASC - Ascending Sampling 
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Figure 14: Results for example 3 (real data). Marginal pdfs of model parameters given 
by the three strategies after a selected number of experimental design stages 



